-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reading grib files for stats #33
base: main
Are you sure you want to change the base?
Conversation
andleuth
commented
Jul 19, 2024
•
edited by huppd
Loading
edited by huppd
- add reading grib files for stats computation
- use pinned environment for CI
- make conda necessary for environment creation
Hi @andleuth, I looked at the code and added some comments (see above). I also tried to follow the usual ICON+probtest manual workflow (see here) with this branch, however, I am encountering some errors when I try to generate the stats, both with (for the GPU results to check) and without (for the CPU references) the
I am not sure if you already encountered something similar? If yes, can you recall what the solution was? This reminds me of the issue you had with the steps' units. PS: I tested this on mch_icon-ch1 with 1-hourly output. |
Hi @stelliom , the error occured because I did not activate the conda environment before setting the ECCODES_DEFINITION_PATH. Thanks for your hint! |
A small comment: It would be nice to have that new feature covered with a unittest. |
Now, I propose these as next steps:
|
…nto reading_grib_files
|
||
|
||
# Setting ECCODES_DEFINITION_PATH: | ||
${CONDA} activate ${ENV_NAME} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this also tested with the -m
option? I.e. where CONDA=mamba
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't think so. so maybe better to remove this option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmh, I am not sure if we ever use mamba for probtest. I guess that is coming from the blueprint, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from line 53 of this file, ${CONDA} activate/deactivate
is probably an issue when using mamba, so we should at least change those commands to just conda activate/deactivate
. The rest is probably fine, judging from the code above.
However, if we don't care about mamba here, then I agree that we could actually remove it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it comes from the blueprint. Then lets keep it lean and remove the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok 👍
@@ -11,6 +11,7 @@ | |||
"member_type": "{{member_type}}", | |||
"factor": 5, | |||
"file_specification": [{ | |||
"GRIB": {"format": "GRIB", "time_dim": "step", "horizontal_dims": ["values"], "var_excl": ["tlon", "tlat", "vlon", "vlat", "ulon", "ulat", "h", "slor", "anor", "isor", "sdor"], "fill_value_key": "_FillValue"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at some point we may want to come up with a more flexible solution for var_excl
, so that we don't have to hardcode it here. But I think this is not in the scope of this PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open for suggestions. Maybe worth to raise an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot remember now what the actual error was if those variables were not excluded, but at the time we were pretty sure that we did not need to test those values, since they seemed to be dimensions/coordinates. If I remember correctly from what Andrea told me, in GRIB there is no distinction between variables and dimensions/coordinates as in NetCDF, do you know if this is true?
I think that the names of these variables are probably not going to be the same in every GRIB file, so the current solution might be very specific to our needs and therefore pretty fragile in case someone wants to use probtest outside ICON or even with different experiments. If we could find a way to distinguish between variables and these "dimensional fields" within the code, then we could simply fill the short_names_excl
list using such criterion. I am not an expert with GRIB files, but I guess these variables (judging from their names) probably do not change with time, therefore they might not have a time dimension. Maybe we could leverage this and use it as a criterion to exclude them from probtest, assuming we are never interested in testing variables that are constant in time, which seems reasonable to me. Another uglier option could be to use whatever error those variables were causing to our advantage by handling it in a try-statement and adding the name of the variable to the short_names_excl
list in case of error.
I think we could raise an issue for this, but maybe it is not super urgent. I guess it depends how much the list of excluded variables changes from file to file and how many different people will attempt to use probtest of GRIB files.
Sorry for the wall of text 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the wall of text :)
I am not a grib expert, but I would assume it is true. We had a similar problem with the fof files, and there we just included files that have the specific horizontal dimensions, which excludes all the dimension variables.
But I think for now I would just put it all in a desperate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree! Will you open the issue or should I do it?
@@ -185,6 +192,58 @@ def unify_time_index(fid_dfs): | |||
return fid_dfs_out | |||
|
|||
|
|||
def adjust_time_index(fid_dfs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required as part of this PR? I.e. for reading GRIB files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well. It is a workaround. I am not 100% satisfied with it. So there is unify_time_index
which converts the times to integer timesteps. This works not for the case of grib files that are written every 10s. So I added this adjust_time_index
routine which will fix it and just overwrites with the time steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok. However, I am not sure I understand why the fact that GRIB files are written every 10s causes an issue, can you not just replace the times with integer timesteps as well as for NetCDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @huppd,
I looked at your changes compared to what I already reviewed for Andrea. I added a couple of comments/doubts, but in general this looks fine to me. However, I have to admit I am not super sure about the Python tests, since I am not an expert on that.
I will approve the PR now, since I will be on vacation for the rest of the week. This way you will still be able to merge in my absence. Otherwise, I will come back to this next Monday 🙂
Cheers,
Mikael
Hi @stelliom |